【Python入门算法11】如何基于NumPy快速生成一个随机列表？

您所在的位置：网站首页 › excel 随机数列 › 【Python入门算法11】如何基于NumPy快速生成一个随机列表？

【Python入门算法11】如何基于NumPy快速生成一个随机列表？

2023-05-05 16:13| 来源: 网络整理| 查看: 265

1 引言 Introduction

之前讨论过的算法，全部涉及的数据结构都是列表。此外，在讨论某种算法性能的时候，我们可能需要一些规模较大的数据列表进行测试。

比如——100万个1-100生成的随机整数列表。

在生成随机数字方面，较常用的模块是NumPy。不过NumPy默认生成是是数组（nd array），不是列表，所以生成之后需要转换。

2 NumPy的随机数生成

在NumPy中，常用的随机数字生成函数有如下几个：

np.random.random()：生成一个长长的、0-1之间的随机小数。

我们先导入NumPy模块：

# -*- coding: utf-8 -*- """ Created on Tue Jun 15 00:24:10 2021 @Software: Spyder @author: 盲区行者王 """ import numpy as np

然后生成一个随机的、0-1之间的小数：

np.random.random() Out[4]: 0.42932504744000544

还可以通过 random.seed() 方法的参数值，以保证每一次生成的随机数不变。（因此，计算机各种程序生成的随机数，事实上是“伪随机数”）

np.random.seed(1898) np.random.random() Out[35]: 0.3085519817311221 np.random.seed(1898) np.random.random() Out[36]: 0.3085519817311221 ##种子不变，生成的随机数不变

或者，我们还可以使用Python自带的round()函数，将长长的小数保留4位：

round(np.random.random(), 4) Out[38]: 0.4342

但是，对于排序算法，我们一般使用的是整数。通常是1-100的整数。因此可以使用：

np.random.randint()

这里的 int 指的是 integer 整数。

我们试试查看这个函数的帮助文件：

help("numpy.random.randint") Help on built-in function randint in numpy.random: numpy.random.randint = randint(...) method of numpy.random.mtrand.RandomState instance randint(low, high=None, size=None, dtype=int) Return random integers from `low` (inclusive) to `high` (exclusive). Return random integers from the "discrete uniform" distribution of the specified dtype in the "half-open" interval [`low`, `high`). If `high` is None (the default), then results are from [0, `low`). .. note:: New code should use the ``integers`` method of a ``default_rng()`` instance instead; see `random-quick-start`. Parameters ---------- low : int or array-like of ints Lowest (signed) integers to be drawn from the distribution (unless ``high=None``, in which case this parameter is one above the *highest* such integer). high : int or array-like of ints, optional If provided, one above the largest (signed) integer to be drawn from the distribution (see above for behavior if ``high=None``). If array-like, must contain integer values size : int or tuple of ints, optional Output shape. If the given shape is, e.g., ``(m, n, k)``, then ``m * n * k`` samples are drawn. Default is None, in which case a single value is returned. dtype : dtype, optional Desired dtype of the result. Byteorder must be native. The default value is int. .. versionadded:: 1.11.0 Returns ------- out : int or ndarray of ints `size`-shaped array of random integers from the appropriate distribution, or a single such random int if `size` not provided. See Also -------- random_integers : similar to `randint`, only for the closed interval [`low`, `high`], and 1 is the lowest value if `high` is omitted. Generator.integers: which should be used for new code. Examples -------- np.random.randint(2, size=10) array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0]) # random np.random.randint(1, size=10) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) Generate a 2 x 4 array of ints between 0 and 4, inclusive: np.random.randint(5, size=(2, 4)) array([[4, 0, 2, 1], # random [3, 2, 2, 0]]) Generate a 1 x 3 array with 3 different upper bounds np.random.randint(1, [3, 5, 10]) array([2, 2, 9]) # random Generate a 1 by 3 array with 3 different lower bounds np.random.randint([1, 5, 7], 10) array([9, 8, 7]) # random Generate a 2 by 4 array using broadcasting with dtype of uint8 np.random.randint([1, 3, 5, 7], [[10], [20]], dtype=np.uint8) array([[ 8, 6, 9, 7], # random [ 1, 16, 9, 12]], dtype=uint8)

帮助文件果然写的明明白白:

randint(low, high=None, size=None, dtype=int)

那么假设我们想生成一个1-100之间的随机的、一维的数组，有100万个元素，应该这么写：

a1 = np.random.randint(low=1, high=100, size=1000000) ##一维数组 a1.tolist() ##转换成一个列表，非常庞大，不好展示 ##部分结果如下 67, 32, 99, 99, 81, 13, 24, 32, 45, 24, 87, 62, 98,

或者，我们也可以直接用列表推导式（List Comprehension）来生成：

list1 = [np.random.randint(1, 100) for _ in range(1000000)] ##大概耗时1秒钟

生成的就是一个我们想要的、含有100万个1-100整数的列表。

3 Excel 中100万个数字的排序

100万个整数，在excel中的排序应该是可以做的。Excel 在2007版以后，就可以处理这种百万量级的简单数据（最多越105万行）。我们可以用VBA生成100万个1-100的整数：

Sub dome() Dim arr(1 To 1000000, 1 To 1), i As Long Randomize For i = 1 To 1000000 arr(i, 1) = Int(Rnd() * 100) Next [a1].Resize(1000000, 1) = arr End Sub

Excel 2019 运行结果截图如下：

用VBA生成随机整数

排序，点击 OK 。

结果大概耗时1.5秒钟，100万个数字就升序排好了。

最小取值是0，最大取值是994 Python中的冒泡排序

接下来，我们试试用比较优雅的冒泡排序，Python中进行100万个整数的排序。

import numpy as np import random np.random.seed(1898) def bs(list): ## print("原始列表: ", list) for loc in range(len(list)-1, 0, -1): ##loc取值是从9到0 for i in range(loc): ##假设loc=9，i的取值是0到8 if list[i] > list[i+1]: list[i], list[i+1] = list[i+1], list[i] ## print("第", 9-loc+1, "趟: ", list) ## list1 = [10, 2, 5, 6, 8, 7, 9, 1, 3, 4] list1 = [np.random.randint(1, 100) for _ in range(1000000)] ##大概耗时3秒

然后，执行冒泡排序：

bs(list1)

然鹅，结果却是让我大跌眼镜，结果Spyder卡住了：

卡了很久，然后无奈强制退出

可见，这个冒泡排序处理这个百万量级的能力还是不够的。

5 小结 Conclusion

通过Python中基于NumPy模块生成百万级别的随机列表，我们将冒泡排序和Excle中的排序进行了对比，得出一下结论：

Excel的百万量级数据的排序效率还是很高的。虽然本人目前还不清楚它的具体算法。Python中的冒泡排序难以处理百万量级的数据。使用NumPy + 列表推导式，生成随机数列表最为方便。

最后，不知道有没有人好奇，既然pandas比NumPy更高级，何不用pandas直接生成一个一维的序列？

其实pandas是比NumPy相对更加“高级”，但是，很多基础功能仍然依赖NumPy。在pandas出现的地方，通常NumPy也会出境。两者的一个典型合作场景如下：

import pandas as pd import numpy as np df = pd.DataFrame(np.random.randint(0, 100, (100, 4)), columns=list('abcd')) df.head() Out[1]: a b c d 0 20 38 34 43 1 80 10 90 52 2 35 14 90 2 3 70 15 86 72 4 98 49 92 98

最后，推荐两本小白都能学会的Python算法书（也是本专栏主要的参书籍），他们可以帮助新手的编程技能学到飞起。

-----全文结束-----

【本文地址】

【Python入门算法11】如何基于NumPy快速生成一个随机列表？

【Python入门算法11】如何基于NumPy快速生成一个随机列表？

今日新闻

推荐新闻